Fix for unmarshalling error for keepequal/dropequal #2794

rashmichandrashekar · 2024-03-27T00:35:44Z

Description:
Fixing a bug - Fix for targetAllocator returned scrape config crashing otel MetricsReceiver when relabel or metric relabel action has action:keepequal or dropequal

Link to tracking Issue(s):
2793

Resolves: TargetAllocator returned scrape config crashes otel MetricsReceiver when relabel or metric relabel action has action:keepequal or dropequal #2793

Testing:
Tested that it doesn't crash metricsreceiver

swiatekm · 2024-03-27T13:11:02Z

cmd/otel-allocator/server/server.go

+
+	var jobToScrapeConfig map[string]interface{}
+	err = json.Unmarshal(jsonConfig, &jobToScrapeConfig)
+	if jobToScrapeConfig != nil {


Instead of doing this scary looking surgery, we should probably do something similar to what prometheus-operator does, and manually build the scrapeconfig out of appropriate yaml structs: https://github.com/prometheus-operator/prometheus-operator/blob/4e3b2bcea44cbfdbd5c4975d777240285721dc6b/pkg/prometheus/promcfg.go#L1645/

Thanks @swiatekm-sumo. But that would mean that whenever there is an update to the scrape config, we would need to manually update this as well which seems tedious. Also, I dont see any libraries I can reuse. Until it is fixed in prometheus, could this be considered as a fix since the marshal implementation simplifies the conversion and we can rely on the interface implementation to reliably convert to yaml? Once that is done, we can just remove this.

@swiatekm-sumo - Following up. Please let me know your thoughts. Thanks!

I understand that your approach fixes this specific problem, but is still leaves us uncertain if we don't have other similar problems that we have yet to encounter. I'm not against merging this specific fix as-is, but I'd like to look for a more systemic solution, where we can be assured our serialized scrape configs are correct. @jaronoff97 @pavolloffay wdyt?

Thanks @swiatekm-sumo. As a part of this fix, I also did test out all the relabel actions available with prometheus configuration. These 2 were the only actions that cause problems and since this is unique to a bug, should not likely reoccur. If there will be other issues related to marshalling, it would also affect all prometheus users dependent on it and the ideal fix should be from the prometheus implementation of marshalling. We should also be able to reliably take dependency on the native prometheus marshalling/unmarshalling implementation without duplication of effort. Please let me know what you think. Thanks again!

Are we able to write unit tests checking this?

Sure @swiatekm-sumo. I have added tests checking that unmarshaling doesn't result in errors for any prometheus action supported. Please take a look. Thank you!

…rashekar/opentelemetry-operator into rashmi/keepequal-fix

swiatekm

I think I'm ok with this fix, although I'd like to open an issue to find a better solution as well. But I'd like @pavolloffay and @jaronoff97 to give their opinions as well.

cmd/otel-allocator/server/server.go

cmd/otel-allocator/server/server_test.go

…rashekar/opentelemetry-operator into rashmi/keepequal-fix

swiatekm

Please fix the linter error, otherwise this looks good to me.

swiatekm

Thanks for fixing this, and being patient with my feedback. As I said earlier, I consider this a temporary fix until we can figure out a more robust way of dealing with the problem. However, fixing the bug is more important than other considerations here in my view.

jaronoff97 · 2024-04-10T14:25:38Z

Could you add a comment to the function signature explaining why we need to do this, linking this issue prometheus/prometheus#12534

rashmichandrashekar · 2024-04-10T15:33:22Z

Thanks for fixing this, and being patient with my feedback. As I said earlier, I consider this a temporary fix until we can figure out a more robust way of dealing with the problem. However, fixing the bug is more important than other considerations here in my view.

Thanks @swiatekm-sumo. Your feedback makes sense to me.

…rashekar/opentelemetry-operator into rashmi/keepequal-fix

rashmichandrashekar · 2024-04-10T16:59:45Z

Could you add a comment to the function signature explaining why we need to do this, linking this issue prometheus/prometheus#12534

Thanks @jaronoff97. Added the comment. Please take a look.

cmd/otel-allocator/server/server.go

Co-authored-by: Jacob Aronoff <[email protected]>

rashmichandrashekar · 2024-04-10T17:38:00Z

@jaronoff97 - I have committed your suggestion. Please merge if all looks good, Thanks again!

jaronoff97 · 2024-04-10T22:02:11Z

@rashmichandrashekar before I merge, did you confirm this worked by running the TA locally and the scrape_configs response worked with the collector? If not, it may be helpful to add an e2e here...

rashmichandrashekar · 2024-04-10T22:20:04Z

@rashmichandrashekar before I merge, did you confirm this worked by running the TA locally and the scrape_configs response worked with the collector? If not, it may be helpful to add an e2e here...

@jaronoff97 - Yes, I did test out that this works by running TA and collector locally :)

) * fix * add require * fixing lint errors * adding unit tests * fixing some yaml * adding few more tests * fixing lint * adding comment * Update cmd/otel-allocator/server/server.go Co-authored-by: Jacob Aronoff <[email protected]> --------- Co-authored-by: Jacob Aronoff <[email protected]>

fix

3542e28

rashmichandrashekar requested review from a team March 27, 2024 00:35

swiatekm reviewed Mar 27, 2024

View reviewed changes

rashmichandrashekar added 9 commits April 2, 2024 11:18

Merge branch 'main' into rashmi/keepequal-fix

bbc908f

add require

76cceab

Merge branch 'rashmi/keepequal-fix' of https://github.com/rashmichand…

359bd30

…rashekar/opentelemetry-operator into rashmi/keepequal-fix

Merge branch 'main' into rashmi/keepequal-fix

ee10c4c

fixing lint errors

68514e3

Merge branch 'rashmi/keepequal-fix' of https://github.com/rashmichand…

ded0d91

…rashekar/opentelemetry-operator into rashmi/keepequal-fix

adding unit tests

028f90a

fixing some yaml

36e1a25

Merge branch 'main' into rashmi/keepequal-fix

e9ae63b

swiatekm reviewed Apr 9, 2024

View reviewed changes

cmd/otel-allocator/server/server.go Show resolved Hide resolved

cmd/otel-allocator/server/server_test.go Show resolved Hide resolved

rashmichandrashekar added 5 commits April 9, 2024 10:18

adding few more tests

91d2ebe

Merge branch 'rashmi/keepequal-fix' of https://github.com/rashmichand…

b72563b

…rashekar/opentelemetry-operator into rashmi/keepequal-fix

Merge branch 'main' into rashmi/keepequal-fix

e3ef0f9

fixing lint

af8e992

Merge branch 'main' into rashmi/keepequal-fix

9af2992

swiatekm reviewed Apr 10, 2024

View reviewed changes

swiatekm approved these changes Apr 10, 2024

View reviewed changes

rashmichandrashekar added 2 commits April 10, 2024 09:58

adding comment

b9980e6

Merge branch 'rashmi/keepequal-fix' of https://github.com/rashmichand…

ceb1bd2

…rashekar/opentelemetry-operator into rashmi/keepequal-fix

jaronoff97 reviewed Apr 10, 2024

View reviewed changes

cmd/otel-allocator/server/server.go Outdated Show resolved Hide resolved

rashmichandrashekar and others added 2 commits April 10, 2024 10:04

Update cmd/otel-allocator/server/server.go

bf11879

Co-authored-by: Jacob Aronoff <[email protected]>

Merge branch 'main' into rashmi/keepequal-fix

b8b4023

Merge branch 'main' into rashmi/keepequal-fix

5b4ff61

jaronoff97 approved these changes Apr 11, 2024

View reviewed changes

jaronoff97 merged commit e107ffe into open-telemetry:main Apr 11, 2024

Fix for unmarshalling error for keepequal/dropequal #2794

Fix for unmarshalling error for keepequal/dropequal #2794

Uh oh!

Conversation

rashmichandrashekar commented Mar 27, 2024 • edited by jaronoff97 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swiatekm Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

rashmichandrashekar Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

rashmichandrashekar Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

swiatekm Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

rashmichandrashekar Apr 3, 2024

Choose a reason for hiding this comment

Uh oh!

swiatekm Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

rashmichandrashekar Apr 8, 2024

Choose a reason for hiding this comment

Uh oh!

swiatekm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

swiatekm left a comment

Choose a reason for hiding this comment

Uh oh!

swiatekm left a comment

Choose a reason for hiding this comment

Uh oh!

jaronoff97 commented Apr 10, 2024

Uh oh!

rashmichandrashekar commented Apr 10, 2024

Uh oh!

rashmichandrashekar commented Apr 10, 2024

Uh oh!

Uh oh!

rashmichandrashekar commented Apr 10, 2024

Uh oh!

jaronoff97 commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rashmichandrashekar commented Apr 10, 2024

Uh oh!

Uh oh!

rashmichandrashekar commented Mar 27, 2024 •

edited by jaronoff97

Loading

jaronoff97 commented Apr 10, 2024 •

edited

Loading